-
-
Notifications
You must be signed in to change notification settings - Fork 329
Add Pyodide support and CI jobs for Zarr #1903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Zarr v3 has |
Update: we added both Next, when Zarr v3 (stable) comes out on PyPI in the next month or so, I shall bump the version/recipe for the in-tree Pyodide build as well. It would be great to have the out-of-tree build, i.e., this PR, ready for review/merging by then, and I will be committed to putting my best efforts in doing so – thanks! |
The Pyodide 0.26.0 release is out, and we have been lucky to have got both |
f23d1d9
to
cdf0bb2
Compare
b52b6c4
to
7ae9a97
Compare
This is because they test async code and also use threads. See the following issues: pyodide/pyodide#2221 pyodide/pyodide#237
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I apologise for the radio silence on this PR. I've recently been exploring Pyodide/WASM support for Zarr again, as we aim to reintroduce Zarr to https://github.com/pyodide/pyodide-recipes. We will utilise that repository in the next release of Pyodide, namely version 0.28, scheduled for release in a few weeks – please see the tracking issue at pyodide/pyodide-recipes#99 for a subset of our remaining work. We currently use Python 3.13, and the first version of Zarr that supports it is v3, which meant that we needed to get this working – we had disabled Zarr due to the use of threads. I've tried to revisit support for Pyodide through the recent commits I've pushed, and I've been able to get it to work.
If I understand correctly, Zarr's inherent threading model in v3 can be bypassed in WebAssembly through the browser's/Node.js's event loop, as demonstrated in my changes to src/zarr/core/sync.py
. The only route it disables is zarr.sync
, and tests in test_sync.py
are skipped as a result. If I understand correctly, zarr.sync
is going away anyway with a later version of Zarr.
This PR depends on its sister PR that I created at zarr-developers/numcodecs#529 where we're hoping to add Pyodide/WASM support for numcodecs
. All tests have been passing there and it has been ready for further review. Here, I'm temporarily building numcodecs
from my PR's branch there, as the version of numcodecs
in the Pyodide distribution itself is old does not satisfy the constraint here (>=0.16). I still haven't been able to figure out why the VCS system is not picking up the right version in CI – it looks like a bug where actions/checkout@v4
does not fetch tags and does not perform an unshallow clone when checking out to a directory other than {{ github.workspace }}
. Either way, I've temporarily skipped the version check for now so that we can let the tests run till the end.
Hence, while this is obviously not in a mergeable state – I've been able to get several tests passing and Zarr seems to be working well. I believe this PR is ready for review and further inspection, and it should enable us to integrate a portion of this PR as a patch for Pyodide, allowing Zarr to be re-enabled once the idea receives peer review here. A full review can proceed in the meantime as I work through this PR.
In other respects, what I'm finding a bit concerning is the slowness of the test suite in CI. The CI job took 5 hours to run, which seems like an anomaly – in a Pyodide virtual environment on my macOS machine locally, I've been able to consistently run the tests until completion in ~11 minutes on multiple runs, rather than hours. test_set_orthogonal_selection_3d
takes 6034 seconds here, which is more than 150x of what it takes outside of CI. Would it make sense to add an additional slow_in_wasm
marker so that pytest
can ignore the tests shown below?
============================= slowest 10 durations =============================
6034.84s call tests/test_indexing.py::test_set_orthogonal_selection_3d
4101.22s call tests/test_properties.py::test_array_creates_implicit_groups
3878.97s call tests/test_properties.py::test_array_roundtrip
924.30s call tests/test_properties.py::test_roundtrip_array_metadata_from_store
714.00s call tests/test_indexing.py::test_set_orthogonal_selection_2d
397.94s call tests/test_indexing.py::test_set_coordinate_selection_2d
242.29s call tests/test_indexing.py::test_set_orthogonal_selection_1d
150.35s call tests/test_indexing.py::test_set_block_selection_2d
121.05s call tests/test_indexing.py::test_set_block_selection_1d
95.83s call tests/test_indexing.py::test_get_orthogonal_selection_3d
I've also xfailed some flaky tests arising from Hypothesis, which I haven't been able to figure out the cause for, and marked a Blosc test where I get mismatched arrays (which will need further probing into how Blosc was compiled to WASM) as a known failure case – I hope that is alright!
Out of the two failing tests in the CI job:
-
test_array_roundtrip
xfails on my local setup on several reruns due to Hypothesis having degraded performance when generating test cases, and it looks like it strict-xpassed here. It is likely a flake, too. - I had introduced a lenience factor for
tests/test_group.py::test_group_members_performance[memory]
to accommodate the fact that it takes a little longer to run, likely due to overhead accumulated by running in WASM and the latency coming from the Node.js event loop. The lenience factor/tolerance of 1.1 is apparently good enough for my machine, but we know CI machines are quite unreliable, and the required lenience factor to make it pass here would be 4. I'm open to receiving suggestions about this test. My suggestion would be to rework the test method to test versus multiple group counts—parametrised—and verify that the time taken doesn't scale linearly with the group count (i.e., times should be roughly similar if$O(1)$ , later times should be much larger if$O(n)$ ; and the time ratio should ideally be much less than the linear scaling ratio).
Thank you for your time!
# For some reason fetch-depth: 0 and fetch-tags: true aren't working... | ||
- name: Manually fetch tags for numcodecs | ||
working-directory: numcodecs-wasm | ||
run: git fetch --tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unexplained reasons, even this doesn't help.
# _numcodecs_version = Version(numcodecs.__version__) | ||
# if _numcodecs_version < Version("0.13.0"): | ||
# raise RuntimeError( | ||
# "numcodecs version >= 0.13.0 is required to use the zstd codec. " | ||
# f"Version {_numcodecs_version} is currently installed." | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary change; these lines will be uncommented when we either have a solution for the numcodecs
version being fetched in the CI job or we create yet another alpha with an updated version of numcodecs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about skipping tests that require newer version of numcodecs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, no, this is only to disable the version check. numcodecs
is being compiled from zarr-developers/numcodecs#529 with the latest version, just that the version is incorrect.
"async": {"concurrency": 1 if IS_WASM else 10, "timeout": None}, | ||
"threading": {"max_workers": 1 if IS_WASM else None}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to block spawning any new threads on any occasion, hence max_workers
is set to 1
. Also, I'm not sure if concurrency
should be set to 1
or if we can safely use 10
.
def _make_shutdown_asyncgens_noop_for_pyodide() -> None: | ||
""" | ||
Patch Pyodide's WebLoop to fix interoperability with pytest-asyncio. | ||
|
||
WebLoop.shutdown_asyncgens() raises NotImplementedError, which causes | ||
pytest-asyncio to issue warnings during test cleanup and potentially | ||
cause resource leaks that make tests hang. This is a bit of a | ||
hack, but it allows us to run tests that use pytest-asyncio. | ||
|
||
This is necessary because pytest-asyncio tries to clean up async generators | ||
when tearing down test event loops, but Pyodide's WebLoop doesn't support | ||
this as it integrates with the browser's event loop rather than managing | ||
its own lifecycle. | ||
""" | ||
try: | ||
if not IS_WASM and "pyodide" not in sys.modules: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The missing coverage annotations for this test are because Codecov does not know about this WASM CI job where these code paths are hit. I can mark them with a # pragma: no cover
comment closer to the time of merging this PR.
"async": {"concurrency": 1 if IS_WASM else 10, "timeout": None}, | ||
"threading": {"max_workers": 1 if IS_WASM else None}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same question as above around what an appropriate async.concurrency
value would be
Marking this as ready for review based on the above notes. |
I can't request a review from you as you haven't been previously involved in this PR, @ryanking13, but it would be nice if you could take a look at it as well – thank you! |
# _numcodecs_version = Version(numcodecs.__version__) | ||
# if _numcodecs_version < Version("0.13.0"): | ||
# raise RuntimeError( | ||
# "numcodecs version >= 0.13.0 is required to use the zstd codec. " | ||
# f"Version {_numcodecs_version} is currently installed." | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about skipping tests that require newer version of numcodecs?
# Performance for Pyodide is only marginally worse than for Python, | ||
# usually in the 1.012~1.018 range for a latency of 0.1). So we add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather skip this test as this can be very quite flaky.
# https://developer.mozilla.org/en-US/docs/Web/API/setTimeout | ||
if IS_WASM: | ||
current_loop = asyncio.get_running_loop() | ||
return current_loop.run_until_complete(coro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_until_complete
is no-op in Pyodide unless JSPI is enabled. So I wonder whether if it is better to use run_until_complete
here or just raise an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; do we know of a way to check if JSPI is enabled? We don't have --experimental-wasm-jspi
in https://github.com/pyodide/pyodide/blob/adfa092b26745a16127ce2ad680f81407f07738d/src/templates/python#L24-L26 so I thought it isn't enabled, but it seems to be enabled – I added a few debug print statements, and the currently running loop is indeed the WebLoop
, and run_until_complete
doesn't seem to be a no-op. I'm able to get an AsyncArray
after sync()
completes.
Patch Pyodide's WebLoop to fix interoperability with pytest-asyncio. | ||
|
||
WebLoop.shutdown_asyncgens() raises NotImplementedError, which causes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should implement it in Pyodide?
If this is only problematic in testing environment, how about moving this into conftest.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should implement it in Pyodide?
I haven't thought about it, but yes, I think this should be helpful to put somewhere in pytest-pyodide
, if I understand you correctly. I haven't come across this case/problem in any other downstream package, so I'm not sure (but perhaps Zarr is among the first).
If this is only problematic in testing environment, how about moving this into conftest.py?
Yes, makes sense to me! I'll move it to conftest.py
and make it run only when PYTEST_CURRENT_TEST
is active.
Description
This description is a stub and will be updated in due course.
TODO:
Additional context
Here are a few xrefs to some other PRs that have been implemented for packages in the Scientific Python ecosystem:
scikit-image
): Build and test PyWavelets Pyodide wheels in CI PyWavelets/pywt#701 and Upload dev wheels to Anaconda.org + revamp wheels publishing workflow PyWavelets/pywt#714pandas
repository BLD, TST: Build and test Pyodide wheels forpandas
in CI pandas-dev/pandas#57896scikit-image
scikit-image/scikit-image#7350The greater, long-term goal towards this change is to implement Sphinx-based interactive documentation via JupyterLite and WASM-powered, in-browser kernels, as referenced in Quansight-Labs/czi-scientific-python-mgmt#19, see also: Quansight-Labs/czi-scientific-python-mgmt#18. A pilot that can be readily tried out is available for the "API reference" pages under the PyWavelets documentation. This will be preceded by configuring a place to publish these WebAssembly/Emscripten wheels nightly or on a scheduled cadence for Zarr (which could be a third-party, PyPI-like index such as Anaconda.org) and then integrating Sphinx extensions such as
jupyterlite-sphinx
for hosted documentation.